-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers #713
transformers #713
Conversation
@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as |
Signed-off-by: jiqing-feng <[email protected]>
Agree, I have integrated |
Hi @Qubitium . The optimum and transformers PR have been verified on CPU, do you mind verifying it on cuda? I always met building issues when I built gptqmodel from source. |
@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env. |
|
@CSY-ModelCloud I see a 404 urllib error. Caused by our whl download code? |
@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required. |
Got it. |
@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1. The fix is that gptqmodel needs to receive the full |
We are currently discussing how to best go about this with minimum changes. |
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Unit tests #724 quantization and inference for GPTQ and GPTQ_v2 have been fixed. Created two PRs requesting to merge into jiqing-feng's branch: jiqing-feng/optimum, https://github.com/jiqing-feng/optimum/pull/1/files |
This PR enables transformers example.
For optimum lib, see: huggingface/optimum#2064
For transformers lib, see: huggingface/transformers#35012
Apply the 2 changes and this PR can run the example
transformers_usage.py